A semi-supervised approach for the integration of multi-omics data based on transformer multi-head self-attention mechanism and graph convolutional networks

Background and objectives Comprehensive analysis of multi-omics data is crucial for accurately formulating effective treatment plans for complex diseases. Supervised ensemble methods have gained popularity in recent years for multi-omics data analysis. However, existing research based on supervised learning algorithms often fails to fully harness the information from unlabeled nodes and overlooks the latent features within and among different omics, as well as the various associations among features. Here, we present a novel multi-omics integrative method MOSEGCN, based on the Transformer multi-head self-attention mechanism and Graph Convolutional Networks(GCN), with the aim of enhancing the accuracy of complex disease classification. MOSEGCN first employs the Transformer multi-head self-attention mechanism and Similarity Network Fusion (SNF) to separately learn the inherent correlations of latent features within and among different omics, constructing a comprehensive view of diseases. Subsequently, it feeds the learned crucial information into a self-ensembling Graph Convolutional Network (SEGCN) built upon semi-supervised learning methods for training and testing, facilitating a better analysis and utilization of information from multi-omics data to achieve precise classification of disease subtypes. Results The experimental results show that MOSEGCN outperforms several state-of-the-art multi-omics integrative analysis approaches on three types of omics data: mRNA expression data, microRNA expression data, and DNA methylation data, with accuracy rates of 83.0% for Alzheimer's disease and 86.7% for breast cancer subtyping. Furthermore, MOSEGCN exhibits strong generalizability on the GBM dataset, enabling the identification of important biomarkers for related diseases. Conclusion MOSEGCN explores the significant relationship information among different omics and within each omics' latent features, effectively leveraging labeled and unlabeled information to further enhance the accuracy of complex disease classification. It also provides a promising approach for identifying reliable biomarkers, paving the way for personalized medicine.


Introduction
The advent of cutting-edge sequencing technologies has facilitated the rapid acquisition of voluminous data from various omics domains, including mRNA expression, DNA methylation, and microRNA expression data.The utilization of diverse omics data enables the multifaceted representation of the biological processes underpinning complex diseases.In the early stages, the majority of researchers primarily employed traditional machine learning methods for the "unidimensional" analysis of single omics data in the study of disease mechanisms [1].mRNA gene expression was the most prevalent focus [2][3][4].However, for the intricacies of biological complexity, the analysis of single omics data remains inherently limited [5].Current research has shown that, in comparison to experiments conducted using single omics data, the utilization of multi-omics data sources permits a more comprehensive analysis of disease risk, prognosis, and enhances predictive capabilities [6][7][8][9].The integration analysis of multi-omics data supplements the information from various omics domains, compensating for the limitations of singular omics datasets and providing a more comprehensive research perspective for disease classification [10].
Some of the existing multi-omics studies have been rooted in unsupervised learning approaches.Chen Meng et al. [11] proposed multiple co-inertia analysis (MCIA) method.This method employs a covariance optimization criterion to simultaneously project multiple datasets (such as genes and proteins) onto a common one-dimensional space.It transforms distinct sets of features to a uniform scale, facilitating the extraction of features relevant to sample clusters.Michael J et al. [12] introduced the Joint and Individual Variation Explained (JIVE) method as an exploratory dimensionality reduction tool.JIVE dissects multi-omics datasets and integrates them to acquire comprehensive information regarding breast cancer.However, in recent years, due to the rapid advancement in medical technology and the accumulation of relevant data, the volume of biological features and trait data exhibited by individuals has increased significantly.Utilizing unsupervised learning is no longer sufficient to meet the demands of integrated analysis for multi-omics data.Instead, supervised learning methods in multi-omics, which incorporate sample label information, are increasingly applied in disease prognosis and prediction research.ZI-YI YANG et al. [13] proposed the Multi-Modal Self-Paced Learning (MSPL) algorithm for the integration of multi-omics data.This approach employs a sparse logistic regression classifier in cancer subtype classification and identifies latent biological features.Xu et al. [14] employed a novel hierarchical integrated deep flexible neural forest framework (HI-DFNForest) to integrate three types of omics data: DNA methylation, gene expression, and microRNA expression data, successfully classifying ovarian subtypes.Yang et al. [15] introduced the Subtype-GAN method, a deep adversarial learning approach with multiple inputs and outputs, which utilizes consistency clustering and Gaussian mixture models to identify molecular subtypes of tumor samples.Singh et al. [16] proposed Data Integration Analysis for Biomarker Discovery (DIABLO), a multivariate dimensionality reduction method that maximally utilizes covariance and latent components information within linear combinations of features from multiple omics sources for prediction.While these methods have demonstrated effectiveness, they have not fully considered the relationships among different omics data types and have overlooked interpatient correlations.Given the importance of leveraging both inter-patient correlations and inter-omics relationships, Wang T et al. [17] introduced a Multi-Omics Graph Convolutional Networks (MOGONET) algorithm.This algorithm employs cosine similarity to compute a patient correlation network as input for Graph Convolutional Networks (GCN) and explores cross-omics correlations in the label space using View Correlation Discovery Network (VCDN) after GCN output.Li et al. [18] proposed a multi-omics integration method based on graph convolutional networks (MOGCN).This method utilizes autoencoders for dimensionality reduction, integrates Copy Number Variations (CNV), mRNA, and Reverse Phase Protein Array (RPPA) data, and employs the results of Similarity Network Fusion (SNF) to construct a patient similarity network as GCN input.
In summary, while the aforementioned methods consider inter-patient correlations and inter-omics relationships and have, to a certain extent, improved the accuracy of complex disease classification, they still face certain challenges.Firstly, many data types have a limited number of labeled samples and a larger number of unlabeled samples.Traditional supervised learning methods do not directly leverage information from unlabeled nodes, and classic GCN methods do not utilize unlabeled node information directly during the training process [19], restricting information propagation and diminishing model generalization capabilities.Secondly, previous feature processing methods have not accounted for the unique subspaces of each omics data type and the multiple associations and dependencies among latent features within different omics data.This oversight may lead to results that are biased towards specific omics data types or particular features.Addressing these issues, we propose a novel ensemble learning model for analyzing multi-omics data.It fully exploits the correlations within the latent features of each omics data and inter-omics relationships, as well as the information from unlabeled nodes.The model is constructed by utilizing Transformer encoding modules to explore the potential advanced features and inherent relationships within each omics data and between different omics.Subsequently, it employs Similarity Network Fusion (SNF) to build a patient similarity fusion network.Finally, it employs Self-Ensembling Graph Convolutional Networks (SEGCN) for training, simultaneously utilizing labeled and unlabeled data to better capture the overall characteristics and underlying structures of the data, thereby enhancing model generalization capabilities.Additionally, this model can identify important omics features and biomarkers, offering interpretability and providing a research methodology for future clinical.

Methods
In this section, we shall provide a comprehensive exposition of the content pertaining to the multi-omics data integration learning model, MOSEGCN. Figure 1 illustrates the framework of MOSEGCN, which primarily comprises three components: the Transformer encoding module tailored for multi-omics features learning, the module dedicated to constructing a patient-fusion similarity network, and the ultimate SEGCN classification module.

Transformer
The Transformer model was initially employed in natural language processing [20].Over time, it underwent adaptations for image recognition and object detection, demonstrating its efficacy [21][22][23][24][25].The fundamental Transformer architecture comprises an input layer, multi-head selfattention blocks, normalization layers, feedforward layers, and residual connection layers.Essentially, it embodies an Encoder-Decoder framework [26].Key components within the Transformer model are the multi-head self-attention Fig. 1 MOSEGCN Framework mechanism and the autoencoder.The autoencoder is proficient at discerning latent features from input data, offering an effective approach to amalgamate distinct features [27].The multi-head self-attention mechanism is an enhanced algorithm building upon common attention mechanisms.Its virtue lies in its ability to apprehend the intrinsic correlations among various features across different positions and data points [28].This algorithm excels in capturing the inner relationships among diverse features and mitigating reliance on external information.It notably accentuates critical attributes for classifying related disease subtypes, with a particular emphasis on valuable insights from the test set, comprising unlabeled nodes.
Given that identical samples in the data encompass features from diverse omics, the experimental approach necessitates the full exploitation of concealed information within each omic, inter-omic latent feature information, and multifarious associations and dependencies among features.Consequently, this experimental method introduces a self-attention layer prior to the encoder's output layer.This layer comprehensively captures positional information from the input data, explores correlations between latent features within each omic and across different omics, and assesses the significance of features within each modality.The residual connections [29] facilitate the flow of information within the model, and normalization layers [30], positioned after the self-attention layer and before the feedforward network, enhance training stability and expedite convergence.

Autoencoder
The autoencoder is an unsupervised neural network model employing the backpropagation algorithm.Typically, it consists of two modules: the encoder and the decoder.The encoder maps input data into a lowerdimensional latent space, which is then mapped back to the original data space by the decoder [31].Given that both the latent features learned within each omics data's exclusive subspace and the latent features across different omics contribute to the model [32], and considering that the multi-head self-attention mechanism accounts for correlations among positions in input data, the experimental setup utilizes feature data concatenated from three modalities of the original input X ∈ R N xP , where N represents the number of samples, P = p 1 , p 2 • • • p i where p i represents the features possessed by the i-th modality.The entire process of the autoencoder can be represented as follows:and where X is the reconstruction representation with the same shape as X, θ e and θ d are the parameters of the encoder and decoder neural networks, respectively.Encoder(X, H is referred to as the latent representation of X, meaning the encoder maps N samples from a P-dimensional space to a K-dimensional space.Finally, the autoencoder trains the encoder and decoder by minimizing the reconstruction error to learn useful representations of data both within the same modality and across different modalities: argmin θ e , θ d �X − X� 2 F .In this experiment, only the encoder function block is used to obtain the ultimately valuable features.

The multi-head self-attention mechanism
The multi-head self-attention mechanism builds upon the foundation of the self-attention mechanism, introducing multiple attention heads to fully leverage input information in capturing various associations and dependencies within features.This enhances the model's comprehension of feature information [33].Since the features extracted by the autoencoder may contain some redundancy or irrelevant elements, potentially overlooking hidden information, this experiment employs the multi-head self-attention mechanism to further learn the internal correlations among features at various positions.This, in turn, assigns higher weights to crucial features in the context of cancer subtype classification, aiding the neural network in feature selection [34].In conclusion, the inclusion of the multi-head self-attention mechanism allows for the identification of pivotal features vital for predicting events based on critical information from different omics and individual omics data.The computational formula for multi-head attention is as follows: W o represents the output transformation matrix, h denotes the number of heads, and head i signifies the out- put of the i-th head Q i , K i and V i correspondingly emerge from the linear transformations of the latent vector H, with

SNF
The Similarity Network Fusion (SNF) [35] method employs pairwise correlations between samples to construct sample similarity matrices for each omics data type.In this experiment, the neighborhood size is set to 30, and the hyperparameter σ is assigned a value of 0.5.Distinct sample similarity networks are constructed for different (2) omics data types.Subsequently, leveraging the complementary information from different omics data types, the three distinct similarity networks obtained earlier are computed and fused, eliminating weak connections.Ultimately, a comprehensive view of the disease is established.
In this final comprehensive view, nodes represent samples, and edges indicate pairwise similarities between samples.The experiment implements this module in the PYTHON software using the SNFpy package, facilitating graph integration analysis.

Self-ensembling graph convolutional networks
To enhance model performance by fully leveraging the information from unlabeled nodes, our experiment employs the Self-Ensembling Graph Convolutional Networks (SEGCN) method [36].SEGCN represents a potent and highly reliable self-ensembling learning mechanism that combines GCN (Graph Convolutional Networks) and Mean Teacher in a semi-supervised task.GCN, a deep learning model designed for processing graph-structured data, operates on the fundamental principle of defining convolutional operations using the graph's adjacency matrix.However, the classical GCN algorithm, functioning as a localized spectral graph convolution with first-order approximations, explores only half of the unannotated information [19].
Mean Teacher [37] comprises both a teacher model and a student model.The inconsistency between the student's outputs under slight perturbations and the teacher model's outputs serves as a robust clue for classifying cancer subtypes in unlabeled nodes.In other words, unlabeled nodes can provide highly effective gradients under the supervision of consistency loss to train the model.In this mutually reinforcing process, both labeled and unlabeled sample information is effectively propagated for gradient-based training of GCN.The GCN model [38] obtains the output of a single convolutional layer by configuring the adjacency matrix A and X input features, ) and a teacher model f (� t ), s , t each with their respec- tive weights.Given labeled data . In this experiment, a normalized adjacency matrix A is constructed based on data relationships, x represents the labeled samples.f (A, x; � s ) c represents the predicted probabilities of the student classifier for the c classes, while y c represents the ground truth probabilities for the c classes.In a noisefree environment, the cross-entropy loss for labeled data under supervision is expressed as: In this experiment, model perturbation f ′(.) is achieved by adding only one dropout layer with a dropout rate set to 0.5.The unsupervised consistency loss penalizes the discrepancies between the student's predicted probabilities f ′(A, x; � s ) and those of the teacher f (A, x; � t ) .The formulation of the unsupervised consistency loss is as follows: The overall loss of SEGCN comprises both supervised and unsupervised losses, given as follows: x∈D L UD U ℓ cons Here, the parameter > 0 controls the relative impor- tance of the unsupervised loss in the overall loss.The weights of the teacher model are updated using the exponential moving average of the student's real-time weights, s , with a being the smoothing coefficient and s being the current step.α and are set to their default values in SEGCN, with the number of GCN layers set to 2 to demonstrate that the model achieves its best performance with two layers [36].

Results
In this section, the performance of the proposed MOSEGCN model is evaluated and compared with other state-of-the-art methods:1.Random Forest (RF): Constructing multiple decision trees and combining their predictions for final classification.2. k-Nearest Neighbors Classifier (KNN): Classifying based on the labels of neighboring samples for the sample to be predicted.3. L1 Regularized Linear Regression (Lasso): Considering relationships and differences between multiple categories simultaneously for multi-omics data fusion classification.4. XGBoost: Implementing a classifier based on gradient-boosted decision trees. 5. MoGCN: Utilizing autoencoders (AE) to learn multi-omics features for GCN classification.6. MOGONET: Jointly learning the specificity of omics and the correlation of cross-omics after pre-classification using GCN. 7. Combining Transformer encoding modules with GCN to create a novel model for cancer classification.8. Semi-Supervised SVM (S3VM): This is an extended approach to Support Vector Machines (SVM) that enhances model performance by simultaneously leveraging labeled and unlabeled data.9. SEGCN: A deep learning model designed for semi-supervised tasks, incorporating self-ensembling techniques to boost performance. (4) MOSEGCN is first compared with these nine methods on two benchmark cancer datasets.Subsequently, it is validated for applicability and effectiveness using a multi-omics dataset of glioblastoma multiforme, which contains four cancer subtypes and a total of 274 samples.Finally, the model's sensitivity analysis is employed to identify important biomarkers.

Data preparation
We utilized preprocessed benchmark multi-omics cancer datasets, namely ROSMAP and BRCA [17], to assess the performance of our experimental model across different cancer classification tasks.In particular, the BRCA dataset encompasses classification of invasive breast cancer (BRCA) PAM50 subtypes, including normal, basal, human epidermal growth factor receptor 2 (HER2)enriched, Luminal A subtype, and Luminal B subtype.
The multi-omics dataset for Glioblastoma Multiforme (GBM) was obtained from an open-access website accessed on May 16, 2023, at.This dataset comprises four files: three data groups (i.e., gene expression, DNA methylation expression, and microRNA expression), along with one clinical dataset.To effectively analyze multi-omics data, the following preprocessing steps were undertaken.First, samples common to all four data groups were selected, and features devoid of signals (zero mean) were further filtered.Second, the most significantly differentially expressed genes (the top 25% with the highest variance) were selected and MinMax-Scaler-transformed for subsequent analysis.Regarding microRNA expression data, due to the limited number of microRNA and features available, no selection was performed.The clinical dataset retained labels for the four cancer subtypes of the samples.The experiment utilized a 7:3 split for training and testing, repeated 30 times, with average measurement results reported.Table 1 provides a concise overview of the three datasets.

Hyper-parameter setting
The performance of MOSEGCN is directly influenced by the settings of hyperparameters, and one of these settings is the number of attention heads in the multi-head attention mechanism.Having a higher number of attention heads can potentially lead to increased computational complexity, training requirements, and memory consumption.Additionally, the interaction and integration of information between attention heads may become more intricate, making the model harder to optimize.Conversely, having a lower number of attention heads might limit the model's expressive power and feature extraction capabilities, preventing it from capturing complex relationships and patterns within multi-omics data.Selecting an appropriate number of attention heads requires striking a balance between the model's expressive capacity and computational complexity.Therefore, this study undertakes experimentation to fine-tune and determine the optimal number of attention heads.As depicted in Fig. 2, it becomes evident that when n_head = 4, the three datasets achieve the most outstanding classification performance within the model.

Dataset analysis
(Tables 2, and 3) present the test set accuracy results for the two benchmark cancer datasets, ROSMAP and BRCA.In the binary classification task for ROSMAP, the experiment employs accuracy (ACC), F1 score (F1), area under the receiver operating characteristic curve (AUC), Precision and Recall as evaluation metrics.For other multi-class datasets, accuracy (ACC), weighted F1 score (F1_weighted), macro F1 score (F1_macro), Precision and Recall are utilized.The experimental findings demonstrate that MOSEGCN outperforms in all benchmark test datasets.The accuracy rates for ROSMAP and BRCA reach 83.0% and 86.7%, respectively.Compared to the latest MOGONET method, MOSEGCN shows an improvement of 3.0% and 6.1% in accuracy for these datasets, The results in Tables 2, and 3 demonstrate that the combination of the Transformer encoding module and GCN outperforms the integrated model MOGCN [18], which utilizes AE and GCN modules, particularly in handling multiple omics data sets.Similarly, the evaluation metrics of the MOSEGCN model, incorporating the Transformer encoding module, surpass those of the semi-supervised model SEGCN.This underscores the effectiveness of the Transformer encoding module in integrating multiple omics data sets, showcasing its enhanced capability to capture complex relationships and latent features within the dataset.However, the combination method of Transformer encoding module and GCN does not outperform the evaluation metrics of MOSEGCN using both supervised loss and unsupervised loss utilizing unlabeled node information when only using supervised loss.This underscores the prowess of the SEGCN model within the MOSEGCN framework, effectively tapping into insights from unlabeled nodes to provide invaluable support during the model learning process.The symbiotic relationship between the Transformer encoding module and SEGCN not only highlights their collective strength but also opens up new horizons for pioneering advancements in the prediction and classification of intricate disease.MOSEGCN integrates three different types of omics data, and to demonstrate that MOSEGCN's classification performance surpasses that of single omics datasets, this experiment compares the classification results between single omics data and multi-omics data using MOSEGCN.As illustrated in Fig. 3, the results indicate that simultaneously processing all three omics data types   yields the best classification results.This method of integrating multi-omics datasets considers information from multiple perspectives and levels, thereby enhancing the accuracy of classification predictions.

Validation of MOSEGCN on the GBM dataset
To ascertain the generalizability of MOSEGCN, this experiment applied MOSEGCN to the GBM dataset, which encompasses four major subtypes: Classical, Mesenchymal, Proneural, and Neural [39].The results are presented in (Table 4).Table 4 reveals that the proposed MOSEGCN model performs exceptionally well on the GBM dataset, achieving an accuracy of 89.2%, a weighted F1 score of 89.0%, and a macro F1 score of 89.7%.This performance surpasses all other comparative methods.
These outcomes underscore the broad potential applicability of MOSEGCN for complex disease classification based on multi-omics data.

Identification of significant biomarkers
Sensitivity analysis is a method employed to understand how neural network models respond to variations in input data.Through sensitivity analysis, one can ascertain the contribution of input features to output predictions and discern which input features exert the most significant influence on the model's predictive outcomes [40,41].The importance of a node can be determined by its feature's standard deviation (variable sensitivity) and its contribution to the network, referred to as weight sensitivity [42].In the teacher model, this experiment employed sensitivity analysis for feature extraction.To achieve a stable feature extraction for the teacher model during training, the standard deviation σ i of each input node i's corresponding feature in each omics was calculated along with its connection weight W ij in the network.Every 400 epochs, the top 30 markers were extracted, and the extracted features were consolidated.Table 5 enumerates the biomarkers associated with the classification of BRCA and ROSMAP datasets.According to information from the KEGG database, we have discovered that in breast cancer, olfactory receptors such as OR11H6, OR1J4, OR4N5, and OR11G2O are associated with the olfactory transduction pathway.Olfactory receptors are not only expressed in the nasal cavity but also widely distributed throughout the body, playing significant physiological roles [43].This finding suggests that these sensory receptors may serve as novel, yet insufficiently studied targets in the development and progression of breast cancer.The Estrogen Signaling Pathway plays a crucial role in BRCA1 [44], with KRT16 and TFF1 being part of this pathway.Their expressions influence the biological characteristics of BRCA.Enhanced KRT16 expression is significantly correlated  with lower overall survival in metastatic breast cancer patients [45], while TFF1 is closely associated with bone metastasis in estrogen receptor (ER) + breast cancer [46].PSAT1, CA6 and CA9 are part of the Metabolic Pathways [47] and affect the growth and migration of breast cancer cells [48][49][50][51] MIR100 [52,53] and MIR124-2 [54,55] are two MicroRNAs within the same signaling pathway that induce apoptosis and cell cycle arrest in breast cancer cells through multiple genes.In terms of microRNA, hsa-mir-135b has been identified as a target for treating AGR2-expressing breast cancer with doxorubicin resistance [56].Gong et al. [57] formulated a prognostic risk feature model for predicting the prognosis of breast cancer patients.The results demonstrate a significant correlation between the expression levels of hsa-miR-190b and both unfavorable and favorable prognoses.In Alzheimer's disease, the Calcium Signaling Pathway is one of the major mechanisms.Disruption of calcium signaling may lead to synaptic defects and the accumulation of Aβ plaques and neurofibrillary tangles in AD [58,59].HRC, PTGER1 and CXCR4 are genes related to this pathway and have certain roles in the Calcium Signaling Pathway [60][61][62].The Neuroactive Ligand-Receptor Interaction pathway may play a role in neuroactivity regulation and cognitive functions [63][64][65], with PTGER1, TAC3, and APLN being part of this pathway, influencing the nervous system in patients [66][67][68].DDIT4 and ATG10 are involved in the Autophagy pathway, contributing to the clearance of cellular abnormalities by regulating the autophagic pathway [69][70][71][72].Regarding microRNA, hsa-miR-199b-5p is a potential candidate biomarker for its role in the interaction between diabetes and Alzheimer's disease [73,74] identified hsa-miR-133b as a potential biomarker for Alzheimer's disease (AD), playing a crucial role in constructing the ceRNA regulatory network associated with lncRNA.Hsa-miR-27a is likely a significant epigenetic biomarker in AD, participating in the regulation of the target gene SERPINA3, revealing its pivotal role in the disease's pathogenic mechanism [75].

Discussion
Multi-omics data provides a diverse range of molecular-level insights into biological organisms.The comprehensive analysis of multi-omics data yields more thorough and accurate biological information.Furthermore, it uncovers novel biological insights and associations, fostering innovation in complex disease research.It propels data-driven biological studies and the advancement of personalized medicine.With the rapid advancement of omics technologies and healthcare standards, meticulously annotated omics datasets are on the rise.However, in the real world, the cost of microRNA expression hsa-miR-1246,hsa-miR-1299,hsa-miR-200a,ebv-miR-BART8,hsa-miR-520e,hsa-miR-1275,hsv1-miR-H8,hsa-miR-2117,hsa-miR-199a-5p,hsa-miR-330-3p,hsa-miR-1260,hsa-miR-744,hsa-miR-891b,hsa-miR-1308,hsa-miR-522,hsv1-miR-H3,hsa-miR-2114,hsa-miR-133b,hsa-miR-27a,hsa-miR-509-3p,mcv-miR-M1-5p,hsa-miR-153,hsv1-miR-H1,hsa-miR-208a,hsa-miR-1248,hsa-miR-639,hsa-miR-518e,hsa-miR-194,hsa-miR-199b-5p,hsa-miR-381 extensively annotating data is often prohibitive, resulting in a small fraction of labeled data, leaving a substantial portion unlabeled.To address this challenge, this experiment introduces a deep learning-based semi-supervised multi-omics integration method for biomedical classification tasks.It effectively leverages both labeled and unlabeled data for improved classification of complex diseases.In this approach, we employed the Transformer encoding module for feature learning and integration.The Transformer network introduces a multi-head self-attention mechanism, allowing the model to establish connections between different positions.Moreover, this multihead self-attention mechanism permits the model to consider the relevance of positions in the input data when generating representations for each position.This enhances the model's ability to learn hidden information between different omics data types, which is crucial for effectively integrating multi-omics features.Consequently, in this experiment, we concatenated various omics data to learn useful information at each position, encompassing both intra-modality and inter-modality internal feature information.For the final cancer classification, we employs SEGCN, which adeptly harnesses labeled and unlabeled data, enhancing the model's generalization capacity.The necessity of both key components, the Transformer encoding module and GCN, is verified through their combined use.The generalizability of MOSEGCN is validated on the GBM renal cell carcinoma dataset, where it demonstrates the ability to identify meaningful biomarkers within each omics data, elucidating certain disease-related information.MOSEGCN exhibits strong capabilities in integrating multi-omics data for cancer classification.However, it has limitations, as this study exclusively employed three distinct types of multi-omics data.Multi-omics data with more than three types and heterogeneous data, such as imaging omics, remain unverified.These areas represent future directions for further research.

Conclusion
In conclusion, we introduces an innovative deep learning multi-omics integration model for the classification of complex diseases.Empirical evidence demonstrates the efficient utilization of the Transformer network to capture long-term dependencies in potential features within and across different modalities.Moreover, this experiment leverages the SEGCN module to thoroughly assimilate information from both labeled and unlabeled nodes, resulting in more precise classification outcomes.This integrated model is validated on three public datasets, outperforming state-of-the-art methods.Additionally, it identifies meaningful biomarkers within diverse omics data, further enhancing our understanding of disease mechanisms.In the future, we will explore different modalities and multiomics data integration techniques to further enhance the performance of complex disease classification tasks.

Fig. 2
Fig. 2 Evaluation Metrics as a Function of n _head Variation

Fig. 3
Fig. 3 Comparison of Multi-Omic Data and Single-Omic Data Classification Results Using the MOSEGCN Model

Table 1
Dataset Overview indicating outstanding classification performance in common complex diseases like breast cancer and Alzheimer's disease.MOSEGCN consists of two crucial components: the Transformer encoding module, which learns high-level features and their inherent correlations within and between different omics data types, and SEGCN, which employs labeled and unlabeled information for final classification.To validate the necessity of each component, this experiment combines the Transformer encoding module with GCN for classification purposes.

Table 2
Classification Results on the ROSMAP Dataset

Table 3
Classification Results on the BRCA Dataset

Table 4
Classification Results on the GBM Dataset

Table 5
Identified Important Biomarkers